arXiv:1501.02887vl [cs.CV] 11 Jan 2015 


Online Handwritten Devanagari Stroke Recognition 
Using Extended Directional Features 

Lajish VL Sunil Kumar Kopparapu 

Department of Computer Science TCS Innovations Lab - Mumbai 

University of Calicut, Tata Consultancy Services Limited 

Kerala - 673635, INDIA Thane (West), Maharashtra, India 

E-mail: lajish@uoc.ac.in Email: SunilKumar.Kopparapu®TCS.Com 


Abstract —This paper describes a new feature set, called the 
extended directional features (EDF) for use in the recognition of 
online handwritten strokes. We use EDF specifically to recognize 
strokes that form a basis for producing Devanagari script, which 
is the most widely used Indian language script. It should be 
noted that stroke recognition in handwritten script is equivalent 
to phoneme recognition in speech signals and is generally very 
poor and of the order of 20% for singing voice. Experiments are 
conducted for the automatic recognition of isolated handwritten 
strokes. Initially we describe the proposed feature set, namely 
EDF and then show how this feature can be effectively utilized for 
writer independent script recognition through stroke recognition. 
Experimental results show that the extended directional feature 
set performs well with about 65+% stroke level recognition 
accuracy for writer independent data set. 

L Introduction 

Interest in handwritten script recognition ID, llzl, 13) and 
specifically in online handwritten script recognition a, 0, 
0 has been active for a long time. In the case of Indian 
languages, research works are active especially for Devana¬ 
gari Q, Bangla 0, 0, Telugu Ca and Tamil im, (El. 
Devanagari script, the most widely used Indian script, consists 
of vowels and consonants as shown in Fig. It is used as 
the writing system for over 28 languages including Sanskrit, 
Hindi, Kashmiri, Marathi and Nepali and used by more than 
500 million people world wide. Devanagari is written from left 
to right in horizontal lines and the writing system is alphasyl- 
labary. Barring a few alphabets, almost all the alphabets in 
English can be written in a single strok^ or two. In contrast, 
in most Indian languages, alphabets are made up of two or 
more strokes. This writing requirement makes it necessary to 
analyze a sequence of adjacent strokes to identify an alphabet. 
Majority of the alphabets in Devanagari script are formed by 
using multiple strokes. Language syllables are composed of 
vowels, consonants and their combinations. In a consonant- 
vowel combination, the vowels are orthographically indicated 
by signs called matras. These modifier symbols are normally 
attached to the top, bottom, left or right of the base character. 
Hence the consonants, vowels, matras and consonant/vowel 
modifiers constitute the entire alphabet set. These composite 

stroke is defined as the resulting trace between a pen-down and its 
adjacent pen-up 
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Fig. 2. A set of primitives hand written strokes that can be used to write 
the complete alphabet set in Devanagari. 


characters are then joined together by a horizontal line, called 
shirorekha. 

A careful analysis based on clustering of handwritten De¬ 
vanagari script showed that there was a basis like set of 50 
strokes that was sufficient to represent all the alphabets in 
the Devanagari script. We name these strokes primitives. 
The identified set of primitives (shown in Figure can be 
used to write the complete Devanagari alphabet set (Figure 











































































Sr. No. 

Character 

Handwritten Samples 

1 2 3 4 5 6 
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Fig. 3. Sample set of primitives collected from a single writer. 


[^. In this paper we use these primitives as the units for 
recognition taking parallel from the recognition of phone set 
used in speech recognition literature. In an unconstrained 
handwriting these primitive strokes exhibit large variability in 
shape, direction and order of writing. It is also observed that 
the primitives are combined and broken based on the writer’s 
style of writing which is acquired at the time of learning the 
script. A sample set of primitives collected from the same 
writer at different times over a period of time is shown in 
Figure The variations within the primitives even for the 
same writer is evident and it is observed that the variation 
among different writers is even larger; making the task of 
recognizing these primitives challenging. 

While a large amount of literature is available for online 
handwriting recognition of English, Chinese and Japanese 
languages, until recently, relatively very less work has been 
reported for the recognition of Indian languages IH, O, [01. 
Even among the Indian scripts, notable work has been reported 
only for Devanagari flU . Bangla fH, Tamil and Telugu scripts 
ca, cni. It is also observed that the work done on one Indian 
language script cannot be directly applied for the recognition 
of a second language script because of the vast variation in 
the scripts. The main challenge in online handwritten character 
recognition in Indian language is the large size of the character 
set, variation in writing style (when the same stroke is written 
by different writers or the same writer at different times) and 
the visual similarity between different alphabets in the script. 
A list of visually similar alphabets in Devanagari script are 
shown in Figure 

In this paper, we propose the use of extended directional 
feature (EDF) set for the recognition of primitives (which 
are also called strokes). The variations that exist in the 
primitives (see Figure test the credibility of the proposed 
features. The motivation to look at recognition of strokes 
rather that looking at alphabet recognition is influenced by 
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Fig. 4. A list of some confusing alphabets in Devanagari. 

the speech recognition literature. The strokes are analogous 
to phonemes in speech. It is well know in speech literature 
that though the phoneme recognition accuracies are poor (it 
is about 20% in singing voice flEl ), the flnal output of the 
speech recognition is signiflcantly high. The poor phoneme 
recognition in speech recognition is enhanced by lexicons 
and statistical language models. We believe that even a poor 
stroke recognition accuracies can lead to very high alphabet 
recognition accuracies when knowledge about the written 
language is exploited. The rest of the paper is organized as 
follows. We introduce the extended directional feature set in 
Section |I^ in addition a detailed explanation of data collection, 
pre-processing. Experimental results are outlined in Section 
|nl| and conclusions in Section ||V] 

II. Extended Directional Feature Extraction 

Several temporal features ini, ca have been used for 
script recognition in general and for online Devanagari script 
recognition in particular. We propose a simple yet effec¬ 
tive feature set based on extended directional chain code. The 
detailed procedure for obtaining these directional features is 
given below. 

Let the online stroke be represented by a variable number 
of 2D points which are in a time sequence. For example an 
online stroke would be represented as 

• • •, (xt„,ytj} 

where, t denotes the time and assume that ti < ^2 < • * • < ^n- 
Equivalently we can represent the online stroke (see Figure 
as 

{{Xl,yi),{X2,y2), - ■ ■ ,{Xn,yn)} 

by dropping the variable t. The number of points denoted by 
n vary depending on the size of the stroke and also the speed 
with which it was written. Most handwritten script digitizing 
devices (popularly called electronic pen or e-pen) sample the 
handwritten stroke uniformly in time. For this reason, the 
number of points per unit length of a handwritten stroke is 
large when the writing speed is slow which is especially true 
at curvatures (see Figure and vice-versa. 

























Fig. 5. A sample online character. The represent the (x, y) points, the 
points have been joined to give a feel for the stroke. 



Fig. 6. After smoothing Figure using Discrete Wavelet transform. 


We first identify the curvature points (also called critical 
points) from the smoothed (we use discrete wavelet transform, 
we could have used any noise removal technique (201, see Fig¬ 
ure handwriting data. The sequence {xi^yi)f^Q represents 
the handwriting data of a stroke. We treat the sequence Xi and 
yi independently and compute the curvature points for each 
of these sequence. For the x sequence, we calculate the first 
difference 

X- = sgn{xi — Xi-^i) 

where 

sgn{k) = +1 if Xi — > 0 

sgn{k) = —1 if Xi — < 0 

sgn{k) = 0 if Xi — = 0 

We use x' to compute the curvature point. The point i is a 
curvature point ijf 

A - + 0. 




Similarly we calculate the curvature points for the y sequence. 
The final list of curvature points is the union of all the 
points marked as curvature points by both the x and the y 
sequence (see Figure [^. Clearly the number and position of 
the curvature points are more consistent and occur at the points 
where there is a change in curvature for smoothened stroke 
(Figure |7jb)) when compared to a raw stroke (see Figure [TJa)). 
It must be noted that the position and number of curvature 
points computed for different samples of the same stroke may 
vary. 

Let k be the number of curvature points (denoted by 
Cf, C 2 , • • • c/c) extracted from a stroke of length n; usually 
k « n. The k curvature points form the basis for extraction 
of the extended directional features. We first compute the angle 
between the two curvature points, say ci and c^, as 


Oim = tan 


-1 


f yi-ym \ 

\Xi -XmJ 


Now the extended directional feature set is computed by 
computing the direction between the curvature points as shown 
in Figure Where dim corresponding to the angle Oim 
(computed using the Algorithm and is the direction between 
the curvature point ci and c^. 


Algorithm 1 Angle between two curvature point conversion 
into direction_ 

int deg2dir(double 0) 

int dir = -1; 

if (0 > -7r/8 & (9 < 7r/8) then 
dir = 1; 

end if 

if (deg >= tt/S & 0 < Sir/S) then 
dir = 2; 

end if 

if (0 >= 3pi/8 & 0 < bpi/8) then 
dir = 3; 

end if 

if (0 >= 57r/8 & 0 < 77r/8) then 
dir = 4; 

end if 

if ((6» >= & 6» < ^) II (6» >= & 6» < -^)) then 

dir = 5; 

end if 

if (0 >= -77r/8 & (9 < -57r/8) then 
dir = 6; 

end if 

if (0 >= -57r/8 & (9 < -37r/8) then 
dir = 7; 

end if 

if (0 > -37r/8 & (9 < -7r/8) then 
dir = 8; 

end if 

return(dir); 


Fig. 7. Curvature Point extraction, (a) Raw online character and (b) after 

smoothing using Discrete Wavelet transform. Given, k curvature points, we get (see Figurean extended 
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Fig. 8. Extended Directional Features 
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Fig. 9. Paragraph of online data collected from a user. 


directional feature (EDF) vector of size 
k{k — 1) 

2 ‘ 

In all our experiments we have used this extended directional 
feature set to represent a stroke. 

III. Experimental Analysis 

Eor experimental analysis, we collected data from 10 per¬ 
sons, each of whom wrote a paragraph of Hindi text using 
Mobile e-Notes Taker (see for example. Figure |^. The mobile 
e-note taker is a portable pen based handwriting capture device 
which allows user to write on a normal paper using the 
electronic pen to capture the online handwritten text. The 
SDK provided with the device enables extraction of the x^y 
trace of the online handwriting data. In addition to the x^y 
trace the pen captures the pen-up and pen-down information 
which helps identify a stroke. Each stroke is characterized 
hy ax, y sequence between a pen-down and a pen-up point. 
This raw stroke level data is smoothed using Discrete Wavelet 
Transform (DWT) decomposition, as mentioned earlier we do 
not dwell on this in this paper since this is well covered 
in pattern recognition literature, to remove noise in terms of 
small undulation due to the sensitiveness of the sensors on 
the electronic pen. For each stroke we extracted the extended 
directional feature set as described in Section [III 

We used 5 user paragraph data for training and the other (not 
part of training) 5 for the purpose of testing the performance 


of the ED feature set. We constructed a total of 252 C^C^) sets 
of training and test data. We initially hand tagged each stroke 
in the collected data using the 50 primitives that we selected 
(see Figure [^. The strokes that did not fall into this primitive 
set were marked as being out of vocabulary. All the strokes 
corresponding to the given primitive in the training data was 
collected and clustered together. We retained those primitives 
that occurred atleast 10 times in the train and the test data and 
the rest of the primitive were not used for training and testing. 
In all we were able to get 20 primitives which occurred atleast 

10 times in both the training and the test data set. While the 
dataset is not very large, the 252 different runs demonstrates 
the effectiveness of EDF in recognition of the primitives. 

As a next training step, we calculated the dynamic time 
warping (DTW) distance between all strokes corresponding to 
the same primitive (numbering 20). Note that different strokes 
corresponding to the same primitive had different ED length 
and hence to compute the distance between the two strokes 
we need to use DTW algorithnj^ All strokes corresponding 
to the same primitive which were within a distance of r were 
clustered together and only one representative stroke from the 
cluster was retained as the cluster representative. We chose r 
such that for each primitive there were a maximum of 3 sample 
strokes. So for a set of 20 primitives we had a reference set 
of 60 samples. 

For testing purpose, we took a test stroke {st) from the test 
data, we first extracted EDF and compared it with the EDF of 
the 60 reference strokes using DTW algorithm. We choose 2 
different methods to assign the test stroke into one of the 20 
primitives (classification). 

• Method I: The stroke St is classified as a primitive p* 
such that the DTW distance of St with the primitive p* 
is minimum 

min[d{8t,Pi)Z^] 

Note (i(a, h) is the DTW distance between a and b. 

m Method II: The stroke St is compared with all the 60 ref¬ 
erence strokes and the distance d{st,Pi) for i = 1, • • • 60 
computed. We then take the average distance of the stroke 
from all the 3 references of a primitive. We arrange 
these average distances (20 in number) in the increasing 
order of magnitude. The primitive with the least average 
distance from the test stroke St is declared as being 
recognition of stroke St. 

Table |T| shows the average number of strokes in the test 
data set corresponding to the 20 selected primitives. All the 
experimental results are based on this data set (from 10 people 
but run 252 times). The overall average stroke level recognition 
accuracies for both Method I and Method II did not vary 
significantly and stood at 65.6 % and 65.9 % respectively. 
Meanings on an average 410 for Method I and 412 for Method 

11 of the 625 strokes were correctly recognized. Details of the 
average recognition are captured in Table |I^ It should be noted 

^We do not discuss this algorithm, in detail since it is well used in online 
script recognition literature (^. 
















































TABLE I 

Average (over 252 runs) number oe strokes under each primitive 

IN THE TEST DATA SET. 


Primitive 
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TABLE II 

Average (over 252 runs) Recognition Accuracies eor test data 

SET. 


Primitive 

# Test 

# recognized 
Method I 

# recognized 
Method II 
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that the accuracies are writer independent and for stroke level 
recognition. 

IV. Conclusions 

In this paper we have introduced a new online feature set, 
called the extended directional feature. Based on the extensive 
experimentation (252 runs) on a set of strokes captured from 
a set of 10 people, we observe that this feature set is capable 
of discriminating similar looking strokes quite well. We have 
presented recognition accuracies for writer independent stroke 
level data set. It is well known, both in speech and script 


recognition literature that stroke (phoneme in case of speech) 
recognition is always poor. However like in speech where 
the phone recognition is improved by using lexicon and 
statistical language model, we plan to cluster strokes using 
spatio-temporal information to form alphabets and then use 
the cluster of strokes to classify them into an alphabet. This 
we believe will lead to good accuracies of writer independent 
script recognition. Further the derived primitives set are not 
language dependent and can be used for recognition of other 
languages albeit with a different primitive set. 
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